The thermodynamics of prediction 
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A system responding to a stochastic driving signal can be interpreted as computing, by means of its 
dynamics, an implicit model of the environmental variables. The system's state retains information 
about past environmental fluctuations, and a fraction of this information is predictive of future ones. 
The remaining nonpredictive information reflects model complexity that does not improve predictive 
power, and thus represents the ineffectiveness of the model. We expose the fundamental equivalence 
between this model inefficiency and thermodynamic inefficiency, measured by dissipation. Our 
results hold arbitrarily far from thermodynamic equilibrium and are applicable to a wide range 
of systems, including biomolecular machines. They highlight a profound connection between the 
effective use of information and efficient thermodynamic operation: any system constructed to keep 
memory about its environment and to operate with maximal energetic efficiency has to be predictive. 



All systems perform computations by means of re- 
sponding to their environment. In particular, living sys- 
tems compute, on a variety of length- and time-scales, fu- 
ture expectations based on their prior experience. Most 
biological computation is fundamentally a nonequilib- 
rium process, because a preponderance of biological ma- 
chinery in its natural operation is driven far from thermo- 
dynamic equilibrium. For example, many molecular ma- 
chines (such as the microtubule-associated motor kinesin) 
are driven by ATP hydrolysis, which liberates ~500 meV 
per molecule pQ. This energy is large compared with 
ambient thermal energy, 1 U-qT rj 25 meV (fee is Boltz- 
mann's constant and the temperature is T ~ 300 Kelvin) . 
In general, such large energetic inputs drive the opera- 
tive degrees of freedom of biological machines away from 
equilibrium averages. 

Recently, significant progress has been made in de- 
scribing driven systems far from equilibrium [2] , perhaps 
most notably Jarzynski's work relation [3] generalizing 
Clausius' Inequality, the further generalization embod- 
ied in fluctuation theorems [H [3], and the extension of 
these relations to calculating potentials of mean force [B] . 
These advances have allowed researchers to measure equi- 
librium quantities, such as free energy changes, by ob- 
serving how a system reacts to being driven out of equi- 
librium, e.g. [HIS]- 

This literature typically assunres that the experiment 
is known, i.e. that the exact time course of the driving 
signal is given. However, systems that are embedded in 
realistic environments, for example, a biological macro- 
molecule operating under natural conditions, are exposed 
to stochastic driving. Here, we therefore study driven 
systems for which the changes in the driving signal(s) 
are governed by some probability density Px- This can 
be any stochastic process, and the results we derive re- 
quire neither that Px has specific properties, nor that 
it is known by the system. We assume that there is no 



feedback from the system to the driving signal. The dissi- 
pation, averaged not only over the system's path through 
its state space, but also over driving protocols, then quan- 
tifies the system's energetic inefficiency. 

The dynamics of the system perform a computation by 
changing the system's state, as a function of the driving 
signal. As a result, the new system state contains some 
memory about the driving signal. The system dynam- 
ics can be interpreted as computing a model: past envi- 
ronmental influences are mapped onto the current state 
of the system, which through its correlation with forth- 
coming environmental fluctuations implicitly contains a 
prediction of the future. 

In this paper, we ask how the quality of this (implicit) 
model is related to thermodynamic efficiency. But how 
do we measure the quality of a model? A useful model 
has to have predictive power (see e.g. [5lll2|. and refs. 
therein), meaning it must capture predictive information 
[13H16J . while not being overly complicated. In other 
words, the model should contain as little dispensable 
nonpredictive information as possible. 

Our central contribution is the demonstration of a fun- 
damental equivalence between the instantaneous nonpre- 
dictive information carried by the system and the dissi- 
pation of energy. 

Problem, setup.- Let s t denote the state of the sys- 
tem at time t, while Xt denotes the driving signal. The 
dynamics of the system are modeled by discrete time 
Markovian conditional state-to-state transition probabil- 
ities, p(st\st-i,Xt). The external drive is governed by 
Px = p(xo, . . . , x T ). We assume that at time t — 0, the 
system is in thermodynamic equilibrium, in contact with 
a heat bath with inverse temperature j3 := l/k&T. A 
change in the external driving signal xq — > x\ forces 
the system out of equilibrium. The system responds by 
changing its state sq — > Si, according to the transition 
probability p(s\ | Sq , x i)- The external signal subsequently 



changes again X\ — > X2, and the process is repeated until 
time t = t. 



work step 




relaxation step 

The system remains in thermal contact with the heat 
bath during the entire protocol xq,...,x t , as in [17] . 
Work is done during a work step, as the external signal 
changes from Xt-\ to x t [TTllTg] . 

W[s t -i;%t-l-HBt] ■= E(s t -i,x t )-E(st- 1 ,x t -i) . (1) 



In response to this change, the system relaxes from St-i 
to St in a relaxation step. The total work over the course 
of a driving protocol is W = X)I=i W[st-i',Xt-i—t%t\- 
The total change in energy, AE := E(s T , x T )—E(s Q , Xq)= 
W + Q, equals the total work plus the total heat, 
Q =Yrt=i[ E ( s t,Zt)-E(s t -i,Xt)], flowing into the sys- 
tem during the relaxation steps. 

For now, we assume that the kernel which describes 
the dynamics, p(s t \s t -i,x t ), is fixed. However, the con- 
ditional distribution over states after the work step but 
before the system relaxes, p(st—i\xt), changes as a func- 
tion of time, as does the conditional distribution over 
states after the relaxation step, p(st\xt). In general, these 
distributions are not the same, and neither of them is an 
equilibrium distribution. Under Markovian system dy- 
namics, the probability before a relaxation step simplifies 
to 



P(St-l\X t ) 
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(2) 



(x ,---,Xt-l\x t ) 



and the distribution after relaxation is given by 

p{s t \x t ) = (p(s t \st-i,xt)) pist _ ilxt) ■ (3) 

Angled brackets with a subscripted probability p denote 
an average over p. 

The equilibrium distribution is the same function, be- 
fore and after relaxation, p C q(s\x t ) := e~ /3 ( B ( s,:Et )~ Ft ', 
where s refers to the state of the system with energy 
E(s, Xt), and F t := F[x t ] denotes equilibrium free energy. 
The probability of a specific path through the system's 
state space, given the protocol, is 



P.S\X = Pcq{so\xo)~[[p(St\St-l,Xt) 



(4) 



t=i 



and the joint probability is 

Ps.x :=p(so,...,s T ,xo,...,x T ) (5) 

T 

= p(x )p cq (s \x Q ) Y\_p(x t \x , ..., Xt-i)p(st\s t -i,Xt) ■ 

t=i 

In the following, unless otherwise clear from the context, 
angled brackets without a subscript denote an average 
over the distribution Ps,x- 

Dissipation out of equilibrium. - After the conclusion 
of the protocol, the probability over system states is 
given by p(s r \x r ), in general not an equilibrium distri- 
bution. Then, in addition to the equilibrium free energy, 
F T , there is another contribution to the free energy, be- 
cause the system is not in thermodynamic equilibrium. 



This additional free energy would be dissipated as heat 
to the environment if the system were to relax to ther- 
modynamic equilibrium. For Markovian system dynam- 
ics, the additional nonequilibrium contribution is pro- 
portional [13] to the relative entropy (Kullback-Leibler 
divergence) between the actual distribution p(s T \x T ) at 
the end of the protocol and the equilibrium distribution, 

Ff d [p{s T \x T )] = k B T D KL [p{s T \x T )\\ Pcq {s T \x T )] . (6) 

The non-negative Kullback-Leibler (KL) divergence [T9] 
between distributions p(x) and q(x) is defined as 



D KL [p(x)\\q(x)}:=(\n 



P(x) 
q{x) 



>0 . 



(7) 



p(x) 



The sum of both contributions to the free energy 
constitutes the overall nonequilibrium free energy, 
E ncq [p(s T \x T )} =F T + F^ dd \p(s T \x T )}. Here, nonequilib- 
rium free energy is defined in analogy to the standard 
equilibrium free energy as a functional of the probability 
distribution, but applied to any probability distribution 
[2UH32] . that is to any p(s\x): 



F ncq [p(s\x)} :=(E(s,x)) p ( s \ x )+k B T{ki.\p{s\x)]} p ( s \ x ).(8) 

The average work irretrievably lost over the course of a 
driving protocol, 



(W d 



(W)i 



AF n , 



(9) 



equals the average work performed on the system mi- 
nus the nonequilibrium free energy change AF noq :— 



[p{s T \x T 



\p{so\x ) 



We can compare 



this to the average excess work for a given protocol, 
(W"ex)p S |x := (W)p S i X ~ A-F, the total work done on the 
system in excess of the equilibrium free energy change 
AF:=F T — Fo which would be the work done if the driv- 
ing signal changed quasistatically (infinitely slowly) , and 
hence the system remained in thermodynamic equilib- 
rium throughout the protocol. This excess work equals 
the dissipated work only when the protocol includes a 
final equilibration of the system. 

Since the system starts in equilibrium, the total change 
in nonequilibrium free energy is the equilibrium free en- 
ergy change plus the abovementioned additional contri- 
bution to the free energy, AF noq = AF + F^ dd [p(s T \x T )}. 
The dissipation is then the excess work minus this addi- 
tional nonequilibrium contribution, 



(W dil 



■(W e: 



F? dd [p(s T \x T )]<(W c: 



-(10) 



Later, we derive a lower bound on the dissipation and ex- 
cess work averaged over all protocols, denoted by (Wdiss), 
and (W ex )i respectively. 

Each of the incremental work steps, Xt —¥ x t+ \, is ac- 
companied by a nonequilibrium free energy change given 
by AF ncq [a; 4 -> x t+1 ] := F ncq [p(s t \x t+1 )]~F ncq [p(s t \x t )}, 
so that the average dissipation during each work step is 

(W diss [x t ^x t+1 }) ■- {W[s t -x t ^x t+1 ]) p(stxtxt+i) 

-(AF ncq [x t ^x t+1 }) p{xtXt+i) . (11) 

The nonequilibrium free energy change during each re- 
laxation step is 

AF ncq [x t ;st- 1 ^s t ]=F? dd {p( St \x t )}-F t add {p(s t _ 1 \x t )}, 

(12) 

which equals the change in KL divergence from the equi- 
librium distribution. 

Predictive power, memory, and dissipation.- The 
system state and the external signal are ran- 
dom variables that potentially share information, 



I[st,x t ] 



{\n[p{s t ,xt)/p{st)p{x t )]) p{st 



■>'t)' 



where 



Mutual information 1 23 measures 



P(st) = (p{st\xt)) P ( Xt )- 
the reduction in uncertainty about the outcome of a ran- 
dom variable upon learning the identity of another vari- 
able, and it is symmetric: I[s t ,x t ] = H[s t ] — H[s t \x t ] = 
iJ[a;t] — iJ[xt|st]- Uncertainty is quantified by the en- 
tropy, H[s t ] := — (\a.p{st))p( St ), and the conditional en- 
tropy, H[s t \x t ] := -(liip(s t \x t )) p(suXt - ) , respectively. 

The system transition probability, p(s t \s t -i, x t ), is as- 
sumed to depend on the current signal value Xt and 
system state s t -i- These two dependencies are suffi- 
cient to induce correlations between the system's cur- 
rent state and previous signal values. The memory the 
system keeps about the external signal can then be quan- 
tified by the information that the system state St retains 
about a trajectory {xt_ Tm , .. . ,Xt}. In general, there 



are temporal correlations in the input signal, and hence, 
there can be correlations between s t and future signal 
values. That is, some of the memory retained in the 
system's state is information about the future trajec- 
tory {xt+i, ■ ■ ■ ,Xt+ T A- Here we focus on the instanta- 
neous memory, I mC m(i) " I[stiXt\, and the instanta- 
neous predictive power 11], I pic d(t) '■= F[s t ,Xt+i] = 
H[x t +i] - H[x t+ i\s t }. 

The implicit model computed by the system's dynam- 
ics which map signal x t onto state s t , given the cur- 
rent state St-i, contains the probabilistic m&p p(xt+i\st) , 
which represents the prediction of Xt+i, given the value 
of s t . 

The instantaneous nonpredictive information is defined 
as the difference between instantaneous memory and pre- 
dictive power, I mC m{t) — I P rcd(t)- It represents useless 
nostalgia and provides a measure for the ineffectiveness 
of the model. 

Averaging the nonequilibrium free energy over proto- 
cols allows us to write 

(3(F ncq [p(s\x)}) p{x) = (3(E(s, x)) p(sx) - H[s\x] . (13) 



Combining this with Eqs. (fTl) and (11) [21], we arrive at 
our first result: the instantaneous nonpredictive informa- 
tion is proportional to the average work dissipated while 
the signal changes from x t to x t+1 , 



/3<W dil 



[x t 



Xt+l]) — Anem(i) — Iprod(t) 



(14) 



In summary, the unwarranted retention of past informa- 
tion is fundamentally equivalent to energetic inefficiency. 
Lower bound on total dissipation.- We now re- 
late the total average dissipated work during the en- 
tire protocol, averaged over all protocols, (Wdiss) , 
to the total nostalgia, I mcm — / pr od, given by the 
difference between the total instantaneous memory, 
Imcm '■— St=o Imem(t), and the total instantaneous pre- 
dictive power, ip rc d := Ylt=f) ^predCO- T° th & t en d we 
need to combine Eqs. ( [TTj ) and ( fl4| ), and sum over all 
time steps. This sum includes a sum over changes in 
nonequilibrium free energy, which can be expressed as 



lr-1 

y^ AF ncq {xt^x t ^ 

\t=0 



AF n . 



AO (15) 



in terms of the total nonequilibrium free energy change, 
(AF ncq ) , and the sum of nonequilibrium free energy 
changes during relaxation steps: 

(AF^) := r£ AF ncq [x t ;s t -> s t+1 }\ < . (16) 

This quantity is nonpositive because, on average, during 
relaxation steps the system evolves toward equilibrium. 
The total dissipation then becomes, using Eq. (|14|, 



/3(W d 



'mem -*pred 



P(AF^) 



(17) 



The total nostalgia therefore provides a lower bound on 



the total average dissipation, and also, due to Eq. (10 1, 
on the total average excess work, 



/pred < P(W diaa ) < /3(W ex ) . 



(18) 



We can use this result to refine Landauer's princi- 
ple |25) . which states that any erasure of information 
must be balanced by an increase in entropy elsewhere. 
The information erased during a protocol, such as the 
reset protocol of Landauer [25 , is the entropy change 
X c := H[sq\xq] — H[s T \x T ]. Note that the information 
erased here is not mutual information about the driving 
signal, but rather information that could have potentially 
been extracted from the system by some measurement 
process. Landauer pointed out that the erasure of in- 
formation requires heat to flow out of the system, which 
obeys (using the first and second laws of thermodynam- 
ics, and Eqs. ^ and ^) 



-0(Q) =l c + P(W di3S ) > l c 



(19) 



Substituting our result from Eq. (17) into Eq. (19 1 yields 
the new relation 



-P(Q) 



Ved ~ P{&- F nJT) 



(20) 



Thus (using Eq. (16l) we obtain a refinement of Lan- 
dauer's principle, 



I3(Q)>I C +I n 



'prcd 



(21) 



where the bound is augmented by the total nostalgia. 
The system dynamics of a computing device that retains 
memory therefore must be maximally predictive to ap- 
proach Landauer's limit. 

Discussion.- The dynamics, p(st\st-i, Xt), have been 
assumed fixed for any given system. However, biologi- 
cal systems are typically adapted to their environment. 
One can then ask if there is a simple principle underlying 
the process producing this adaptation. If such a principle 
exists, then it may reflect the importance of energetic effi- 
ciency, because of the resulting competitive advantage for 
reproducing organisms. While other criteria play a role, 
such as robustness and sensitivity, energetic efficiency is 
of fundamental relevance. This is exemplified by biologi- 
cal molecules that harness environmental fluctuations to 
accomplish energetically-costly downstream tasks. The 
more efficiently such a molecule can operate, the more 
it can accomplish through coupling to endergonic chem- 
ical reactions or mechanical actions. For example, with 
more efficient coupling to the environment, the molecu- 
lar motor kinesin can carry larger cargos. Likewise, with 
greater efficiency cytochrome c oxidase complex, an en- 
zyme that pumps protons across a membrane, can cre- 
ate stronger electrochemical gradients. Evidence for the 
importance of energetic efficiency is furthermore found 
in biomolecular machines that approach 100% efficiency 



when driven in a natural fashion: the stall torque for 
the Fi-ATPase [26] and the stall force for Myosin V [27] 
are near the maximal values possible given the free en- 
ergy liberated by ATP hydrolysis and the sizes of their 
respective rotations and steps. 

These and many other biological functions require 
some correspondence between the environment and the 
systems that implement them. Therefore the memory of 
their instantiating systems must be nonzero. We have 
shown that any such system with nonzero memory must 
conduct predictive inference, at least implicitly, to ap- 
proach maximal energetic efficiency. 

A substantial amount of work has sought to relate 
emerging biological functions and behaviors to efficient 
energy usage. Examples range from animal behavior 
(e.g. |28j ) to single neurons, where recently researchers 
have proposed that the minimization of energy expen- 
diture, subject to constraints given by the desired func- 
tion, may be "a unifying principle governing neuronal 
biophysics" [25]. On the other hand, there is much re- 
search on optimal information processing in neurons. For 
a recent review see e.g. [SU], which proposes that the ex- 
traction of predictive information in biological signal pro- 
cessing may constitute, or at least lead to, a general prin- 
ciple. By directly relating memory and predictive power 
to dissipation out of equilibrium, the results we have de- 
rived here indicate that these two important paradigms 
are deeply connected. 

While it is perhaps intuitive that neurons and organ- 
isms should have to implement predictive inference to 
function well, our results have the striking implication 
that on all scales energetic efficiency calls for predictive 
inference. This includes the smallest biological units, 
such as molecular machines. 

Our results also specify the required kind of predic- 
tive inference: maximization of predictive power at a de- 
sired level of system memory, as in [IT] . This connects 
with work on optimal predictive inference algorithms dis- 
cussed in [TT1 [T21 [21 EI], and references therein. We 
envision implementing these algorithms in fast and effi- 
cient hardware. The results we have derived here could 
then be used to choose the energetically most efficient 
implementation among the many possible choices. 

Conclusion.- We argued that dissipation far from 
thermodynamic equilibrium is given by average work mi- 
nus nonequilibrium free energy change. We also argued 
that the nonpredictive part of a system's memory pro- 
vides a natural measure for the inefficiency of a system's 
implicit model of its environment. 

We showed that instantaneous nonpredictive informa- 
tion is proportional to the energy dissipated when an ex- 
ternal driving signal changes by an incremental amount, 
thereby doing work on the system. This result demon- 
strates the intimate connection between prediction and 
energetic efficiency. Summed over the entire protocol, the 
total nonpredictive information provides a lower bound 



on the total dissipation. 

These results imply that any system which is built to 
have nonzero memory has to be predictive in order to 
allow for minimal possible dissipation, i.e. to operate at 
maximal energetic efficiency. Our results furthermore 
allow for a refinement of Landauer's principle, applied to 
systems away from thermodynamic equilibrium. 

We have provided a connection between nonequilib- 
rium thermodynamics and learning theory, by making 
precise how two important aspects of life are fundamen- 
tally related: making a predictive model of the environ- 
ment and using available energy efficiently. 
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